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ABSTRACT : PURPOSE: To retrieve a desired sound source from among the plural sound sources, and 
to attain the data display by a retrieval interface equipped with a virtual sound field space 
and a visualizing means corresponding to the virtual sourid field. 

CONSTITUTION: In a voice information retrieval system, an access to the desired sound 
source is performed from among the plural sound sources based on the direction and 
distance of the desired sound source by the retrieval interface equipped with the virtual 
sound field space and the visualizing means corresponding to the plural sound sources in 
the sound field space. Then, the visualizing means appears at each prescribed point by 
moving in a video and the sound field space with a device such as a mouse, and at the 
time of reaching the desired sound source, the video of the sound source and the 
characteristic are outputted. A mouse inputting part which inputs the instruction of a user, 
picture interface part which stores and offers a picture, and audio control part which 
controls pronunciation are connected with a main control part which controls the entire 
system, and an audio outputting part is connected with the audio control part, and a 
picture outputting part is connected with the picture interface part. 
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Abstract 
Purpose 

The purpose of the present invention is to provide a sound information retrieval system, 
which contains a retrieval interface equipped with a virtual sound field space and a 
corresponding visualizing means by which means the desired sound source is retrieved from a 
plurality of sound sources and the data display is realized. 

Constitution 

In the sound information retrieval system, the desired sound source is accessed from a 
plurality of sound sources on the basis of the direction and distance of the desired sound source. 
By means of a retrieval interface equipped with a virtual sound field space and a visualizing 
means corresponding to the plurality of sound sources of the sound field space. Then, by moving 
in an image and sound field space, by means of a mouse or the like, the visualizing means 
appears at each prescribed point. When the desired sound source is reached, the image of the 
sound source and its characteristics, etc. are output. The mouse inputting part which inputs the 
instructions from the user, an image interface part that stores and provides images, and an audio 
control part which controls pronunciation are connected to a main controller which controls the 
entire system, and an audio output part is connected to said audio control part, and an image 
output part is connected to the image interface part. 




Key: 1 
2 
3 



Audio output part 
Audio control part 
Main controller 



3 



Mouse inputting part 
Image interface part 
Window part 

Claims 

1 . A sound information retrieval system characterized in that: by means of a retrieval 
interface equipped with a virtual sound field space and a visualizing means corresponding to the 
plurality of sound sources of the sound field space, the desired sound source is accessed from 
said plurality of sound sources on the basis of the direction and distance of the desired sound 
source; in that moving in an hnage and sound field space by means of a mouse or the like, the 
visualizing meems appears at each prescribed point; and in that when the desired sound source is 
reached, the image of the sound source and its characteristics, etc. are output. 

2. The sound information retrieval system described in Claim 1 characterized in that the 
virtual sound field space is a space for arranging sound sources within the human audible range. 

3. The sound information retrieval system described in Claim 1 characterized in that the 
virtual sound field space is arranged in a hierarchy, and that the accessed regions can be 
narrowed down. 

4. The sound information retrieval system described in Claim 1 characterized in that the 
visualizing means includes a radar window, a bird's-eye view window, a 3D window, and a data 
display window. 

5. A sound information retrieval device characterized in that a mouse inputting part 
which inputs instructions fi*om the user, an image interface part that stores and provides images, 
and an audio control part which controls the pronunciation are connected to a main controller 
which controls the entire system; in that an audio output part is connected to said audio control 
part; and in that an image output part is connected to the image interface part. 

Detailed explanation of the invention 
[0001] 

Industrial application field 

The present invention pertains to a sound information retrieval system and device 
characterized by the fact that it retrieves the desired sound source fi-om a plurality of sound 
sources by means of a retrieval interface equipped with a virtual sound field space and the 
corresponding visualizing means and its data to be displayed is received. 
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[0002] 
Prior art 

The so-called linking method is the simplest and most often used method for handling 
expanded functions of multimedia data where the database is mainly for alphanumeric data. In 
this method, the multimedia data are linked to one or more alphanumeric data, and when said 
data are retrieved under certain retrieval conditions, the linked image data and sound data are 
also provided to the user. 

[0003] 

On the other hand, there is the index method. According to this method, the index 
retrieval in the conventional database is directly adopted in the multimedia information as is. The 
file no. or data name of each piece of data and its address in the storage device in which the data 
are stored are paired to form an indexing table. During retrieval, the user inputs the file number 
or data name into the retrieval system, and the system checks the input file no. or data name with 
the index to obtain the address of the data in the storage device, and it then outputs the desired 
data. 

[0004] 

A typical example of a system that uses said indexing method is a CD. Each CD has track 
numbers that correspond to the music selections (data). The user reads the table (index) pairing 
the data and the track numbers to find the track number of the music selection to be played. By 
inputting this number in the CD player, the desired music can be played. As a result, the CD 
itself is an excellent sound database. 

[0005] 

Another method is the keyword method, which is an improvement over the 
aforementioned index method in terms of ease of use. A plurality of keywords descriptive of 
certain features of the data are appended to each data and are used in the retrieval process. This 
method is adopted by many multimedia databases; 

[0006] 

In addition, there is the pattern-matching method. Due to developments in recognition 
technology, multimedia information can be input into the computer and the contents recognized 
and analyzed. As a result, the sound data can be retrieved under the sound retrieval condition. In 
this retrieval system, because there is no need to go through the stage of conversion between 
media as would be required in the case of keyword retrieval (conversion fi-om audio data to text 



data, or from text data to audio data), it allows more appropriate sound information retrieval. For 
example, with a Hyperbook of bird pictures, the user can input the retrieval condition by 
mimicking the song of the bird to be retrieved. The mimicked song of the bird is analyzed, and 
the sonic characteristics in terms of amplitude structure, pitch structure, frequency component 
variation structure, etc. are analyzed, and are compared with the sonic cheu'acteristics of the 
birdsong data that have been registered to determine the distance fiinction. 

[0007] 

Problems to be solved by the invention 

The aforementioned linking method appends the multimedia to the alphanumeric data, 
making it difficult to perform a direct retrieval, which is undesirable. 

[0008] 

With the index method, on the other hand, if the user does not know the file number or 
data name of the information to be retrieved, retrieval is very difficult. For example, if the user 
does not know the corresponding relationship between the music selection title and the track 
number on a CD, it is difficult to retrieve the desired music selection (whose title is not 
remembered). 

[0009] 

With the keyword method, it is impossible to use keywords to produce a complete 
representation of the information contained in the audio data and image data. For example, it is 
difficult to formulate in words a complete representation of the music of "Clair de lune" by 
Debussy or the atmosphere of the painting of "Melancholy and Mystery of a Street" by Chirico. 
No matter how hard the user tries to describe the music and atmosphere in words, there is no way 
that the data and picture can be obtained and no way to check whether the retrieval results are 
appropriate. 

[0010] 

The pattern-matching method also has a problem associated with inputting the retrieval 
condition as a sound. For example, although one may try to mimic the song of a bird, it is 
difficult to represent it accurately. 
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[0011] 

Problems to be solved by the invention 

The purpose of the present invention is to solve the aforementioned problems of the prior 
art by providing an interface that provides a novel retrieval system that uses virtual reality. In the 
aforementioned novel system, with the interactive sound field interface, the user sends his/her 
movement to the interface by means of a mouse or other device. As a result, the interface 
provides the sound field that corresponds to the movement, so that the user gets a sense of 
himseldierself moving in virtual space. The next operation takes place in accordance with the 
movement of the sound field. In this way, the user can move around in virtual space while 
exchanging information with the interface. Because movements can be made in virtual space, the 
system has the name interface with interactive sound field (hereinafter to be referred to as ISF). 

[0012] 

In the ISF, the audio data registered in the audio database is set in a virtual sound field, 
and the user can move in the sound field space with the direction and distance of the sound 
emitted by the audio data used as a cue, and it is the retrieval interface that can reach the desired 
information. By accessing certain information, it is possible to display not only the sound 
information of the data, but also image data and other media information on the display screen. 

[0013] 

That is, the present invention provides a sound information retrieval system characterized 
by the following facts: by means of a retrieval interface equipped with a virtual sound field space 
and a visualizing means corresponding to the plurality of sound sources of the sound field space, 
the desired sound source is accessed fi-om said plurality of sound sources on the basis of the 
direction and distance of the desired sound source; by moving in the image and sound field space 
by means of a mouse or the like the visualizing means appears at each prescribed point; and 
when the desired sound source is reached, the image of the sound source and its characteristics, 
etc. are output. 

[0014] 

Also, the virtual sound field space is a space for setting sound sources within the human 
audible range, the virtual sound field space is arranged in a hierarchy, and the accessed region 
can be narrowed down. 
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[0015] 

In addition, the visualizing means includes a radar window, a bird*s-eye view window, a 
3D window, and a data display window. 

[0016] 

Also, the present invention provides a sound information retrieval device characterized 
by the following facts: a mouse inputting part which inputs instructions from the user, an image 
interface part that stores and provides images, and an audio control part for controlling 
pronunciation are connected to a main controller which controls the overall system; an audio 
output part is connected to said audio control part; and an image output part is connected to the 
image interface p£ut. 

[0017] 

As the sound information retrieval interface, the ISF has the following characteristic 
features: 

1 . There is no need to input the retrieval condition. 

(1) No clear retrieval condition s are required. 

(2) Media conversion of the retrieval conditions are not required. 

(3) Information retrieval can be performed by browsing. 

(4) It is a compact interface. 

2. The sound image has left/right sensation of position and sensation of distance. 

3. A plurality of sound data can be heard simultaneously. 

4. It is possible to produce the data space corresponding to the application. 

5. It is possible to form a hierarchical structure of the sound information. 

[0018] 

In the following, a detailed explanation will be given regarding said characteristic 
features. 

[0019] 

As explained above, there is the advantage that information can be obtained even when 
the retrieval condition is not input. This results in several advantages. For example, because there 
is no need to have a retrieval condition, even when there is no retrieval condition such as the 
database retrieval in the prior art, retrieval still can be performed if there is an image that asserts 
"this is actually the sound." 
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[0020] 

Also, because sound is heard as the information is retrieved, it is possible to retrieve the 
sound information without converting the retrieval condition to another multimedia, such as the 
keyword retrieval. 

[0021] 

It is well known in the field of image interfacing that browsing refers to a retrieval 
method in which the desired information is retrieved while "searching" through information in 
the same was as a reader browses a book to find certain information. In this case, one may come 
across useful information while browsing. As a matter of fact, all of us have had the experience, 
while searching for specific information, of accidentally finding other useful information. Such 
accidental discoveries of information would be impossible with the conventional database 
retrieval operation. Browsing is effective as a human interface and can also be effectively 
applied to sound information. 

[0022] 

In the conventional database system, the interface for retrieval is very complicated. In 
many cases, the operation for inputting the retrieval conditions is complicated. Setting the 
retrieval conditions always requires the character input operation, and a system has been 
proposed in which all of the operations be performed using a mouse. However, it still requires 
the operation of setting the retrieval conditions. At this point, with the ISF, it is only required that 
the user instructs the system by using an input device to indicate where the user wants to move in 
the sound space. At present, the mouse is used as the input device. However, any scheme may be 
adopted as long as the direction can be indicated. For example, one may use the number keys on 
the keyboard or a data glove for the operation. 

[0023] 

The sound interface of ISF also provides the left/right sensation of position and near/far 
sensation of distance. Such a sound field with a sense of stereo can realize an environment in 
which the user can hear the sound data in the natural form. As a result, the user can listen and 
discern a plurality of sound data emitted at the same time. 

[0024] 

Humans have the ability to pick out a specific sound in an environment in which there 
exists a plurality of sounds. For example, suppose that someone is at a cocktail party where many 
persons are talking loudly; if someone suddenly says that person's name, the person would 



immediately turn his/her head towards the speaker. This is called the "cocktail party effect." The 
cocktail party effect is particularly pronounced when the individual sound sources have sound 
images with unique lefl/right position and distance positioning. In this case, by means of 
reproduction of the individual lefl/right position and distance positions for the several data 
registered in the sound database, it is possible to effect identification and comparison. 

[0025] 

When sound data are arranged in a virtual sound space, because the data have no order, 
the user will become confiised. In this case, one may consider the case when performance is 
made to mimic certain sounds in the virtual sound space. 

[0026] 

In ISF, various performances can be made. As an example of the present invention, 
suppose the system mimics nature in "ISF insect pictures." This space contains meadows, forests, 
and flowing rivers. In the virtual space composed of their sounds, the sources (sound data) are 
arranged to mimic "insects in the meadow" and "birds in the forest." Consequently, in the virtual 
sound space in the ISF, the source (sound data = insects, etc.) and objects (things other than the 
sound data, including those that do not make sounds = trees, rivers, etc.). The user can walk in 
the virtual sound space to get the information. 

[0027] 

Although humans have the ability to exercise the selective hearing of the cocktail party 
effect, it is impossible to hear and discriminate hundreds and thousands of sounds at the same 
time. On the other hand, there are also hardware restrictions. In the test system assembled for this 
test, the limit is that up to 8 types of sounds can be sent independently to the audio line. 
Consequently, from the human side, the types of sounds that can be generated simultaneously are 
limited. In this case, in order to realize practical application, it is preferred that a hierarchical 
structure of the sound information be formed. 

[0028] 

For example, the sound information may be divided into several groups beforehand, and 
the sound information belonging to a certain group is positioned in a range up to a certain 
position in the virtual sound space. When the user is outside a certain group, the representative 
sound of the group can be heard as the sound image position at the center of gravity of the group. 
The representative sound of the group may be a single sound or all of the sounds of the entire 
group. Only when the user enters the group, can a sound be generated with individual sounds in 
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the group having respective sound image position. As a result, a plurality of sound information 
can be handled. 

[0029] 

For example, among insects, a group of cacadas is formed. When the user approaches 
from outside the group, the user can hear a chorus of chirps from the cacadas. Then, once the 
user enters the group, the user can hear the individual cacadas chirping at their respective sites. 

[0030] 

The method using effector may be adopted in assembling said ISF. In this method, the 
sensation of distance of the sound can be represented in a virtual way. As a result, in the window, 
it is possible to display the window and the back-and-forth relationship of the icon as the 
sensation of distance. For example, in order to represent the remote side, the effect of the 
reverberation (residual sound) is increased. On the other hand, when the sound from the back 
side is to be represented, a low-pass filter or the like is used. With this method, in addition to the 
left/right sound volume difference, the sound image position in the left/right direction can be 
realized by means of the time difference between the left/right signals. 

[0031] 

With said effector method, it is possible to exchange information with the computer by 
means of the interface of the MIDI specifications, and control can be performed easily, which is 
advantageous. 

[0032] 

However, when the ISF of the present invention is assembled, in addition to said effector 
method, one may also adopt the correlation coefficient variation method in which the correlation 
coefficient between the right signal and the left signal is changed continuously from 1 to -1 . 

[0033] 

In a binaural system, sound is recorded by means of two microphones placed at portions 
corresponding to the eardrums of the two ears of a dummy head that acoustically mimics the 
human head. When the user hears the recorded sound over headphones, it is possible to use the 
binaural system to realize the sensation of position the sound image at the position of the dummy 
head. In particular, for the binaural system adopted at present, without using the dummy head, 
the transfer function of the acoustic properties of the dummy head is measured, and the digital 
filter representing it is used to realize the binaural effect. With this system, it is possible to 
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realize sound image positioning in the back-and-forth and left/right directions. This is a 
characteristic feature of this scheme. 

[0034] 

The aforementioned systems each have their advantages and disadvantages. 
Consequently, when the invention is embodied, a system that matches the characteristics of the 
object is adopted. For example, it is preferred that assembly be performed using the effector 
method in retrieval of insects containing a lot of high frequency components. 

[0035] 

Also, in the binaural system, the time required to compute the convolution integral 
performed using digital filter becomes very long, and a real-time reaction for the overall system 
cannot be realized. If this problem can be solved, the method can well be adopted. 

[0036] 

In summary, there are no restrictions on the assembly system of the present invention. 
One may adopt the appropriate system for the given object sound source. 

[0037] 
Operation 

According to the present invention, because the desired sound is focused on from among 
a plurality of sounds, there is no need to input the retrieval value and other symbols beforehand. 

[0038] 

Also, one may adopt a scheme in which the virtual sound field space and a visualizing 
means are combined to improve the user's on-site sensation, so as to facilitate his making an 
appropriate selection of the desired sound. 

[0039] 

Then, by using a plurality of visualizing means at the same time, it is possible to move 
towards the desired sound source, so as to narrow down the desired sound source. 

[0040] 

Application examples 

In the following, an explanation will be given regarding an application example of the 
present invention with reference to Figures 1 and 2. 
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[0041] 

Figure 1 is a diagram illustrating the constitution of the hardware in the present invention. 
In this constitution, the main controller is connected to the mouse inputting part, the image 
interface part, and the audio control part. The audio control part is connected to the audio output 
part, and the image output part is connected to the image interface part. 

[0042] . 

Figure 2 is a diagram illustrating the equipment on the basis of the constitution shown in 
Figure 1 . In this case, a spare station is connected via RS-232C to personal computer NEC 
PC-9801; PC-9801 is connected via MIDI to sampler EPS- 16; sampler EPS- 16 is connected via 
MIDI, Audio to MIDI mixer DMP-1 1 ; and MIDI mixer DMP-1 1 is connected via Audio to 
stereo headphones. 

[0043] 

Said spare station (I) is a device used to control the entire system. Here, by means of the 
display, the visual infomiation is provided, and the mouse is used to move around in the sound 
field space. 

[0044] 

Said DP/4 can independently process each of the independent audio signals of the 4 
channels. Such processing can be performed in real time by means of MIDI signals (Figure 2). 
Because the correlation coefficient variation method is used, phase control of the audio signal is 
performed for control of the sensation of distance of the sound image. 

[0045] 

Besides the audio signal processing described above, it is possible to provide phase 
shifter callers for reverberation (reflected sound effect) and delays for realizing the sensation of 
distance and broadening of the sound image, and it is possible to provide equalizer functions for 
changing the sound quality, etc. 

[0046] 

Also, EPS-16 is the sound source of the source object, and it can be controlled by the 
MIDI signal. In a IrMB memory, up to 32 types of sounds can be digitally recorded, 16 types of 
sounds can be reproduced simultaneously, and audio signals can be output from the independent 
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lines of 8 channels, the sampling rate of said EPS- 16 is up to 44.8 MHz, and it is possible to 
obtain the same sound quality as that of CD. 

[0047] 

DMP-1 1 is used to group the plurality of sound images obtained with said EPS-16 and 
DP/4 to a single sound field. Here, DMP-11 is an 8-channel digital mixer that allows control with 
MIDI signal. The analog sound lines of 8 channels are A/D-converted when input, and then the 
unmodified digital signal is sent to a DSP for processing. Consequently, it is possible to suppress 
the degradation of the sound quality and to suppress generation of noise, and, at the same time, it 
is possible to realize panning (left/right sound image positioning), reverberation (reflected sound 
effect), equalization (change in frequency component), etc. independently for each channel 
(Figure 2). 

[0048] 

As explained above, it is preferred that said MIDI equipment be used and the workstation 
support the aforementioned functions. However, in the present development state, only a portion 
of the function can be provided. For example, instead of a sampler, it is possible to store the 
audio data in a peripheral storage device of the workstation. However, the audio data cannot be 
output from two or more channels as a plurality of independent lines. In the future development 
of multimedia, when said functions are incorporated as a portion of the workstation, they can 
also be adopted in the present invention. 

[0049] 

There are the following four parameters pertaining to the sound image positioning in the 
sound image control in said application example. 

[0050] 

1 . The volume is inversely proportional to the square of the distance between the user and 
the source object. 

2. For left/right positioning, panning is used to divide the sound image into 32 directions 
spanning 360°. 

3. For the sensation of distance, reverberation is used so that the greater the distance, the 
higher the reverberation level. 

4. The sound that should be heard from behind the user is that obtained by means of a 

low-pass filter. 
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[0051] 

When the correlation coefficient variation method is used to generate the aforementioned 
sensation, relatively good results can be obtained as far as white noise is concerned. However, in 
this application example, when the sound of insects are retrieved, a good sensation of distance 
cannot be obtained. Consequently, in this application example, the method using effector is 
adopted. As explained above, when the sounds of insects are retrieved, the correlation coefficient 
variation method gives unfavorable results. The reasons are as follows. 

[0052] 

That is, the correlation coefficient variation method has the sensation of distance appear 
significantly with respect to a sound image moving in the back-and-forth direction; the sensation 
of distance depends on fi-equency, with a less pronounced sensation of distance in the higher 
frequency region; when practically adopted in broadcasting, etc., it is combined with the other 
methods for displaying the sensation of distance, etc. 

[0053] 

Consequently, depending on the property of the sound source, there is a method that can 
be well adopted for embodiment of the invention. 

[0054] 

The MIDI data that control said DMP-1 land EPS- 16 are individual 6-byte information 
[units], and the structure shown in Figure 3 is adopted. That is, the first and second bytes are the 
channel numbers assigned to the controlling MIDI equipment. For example, the channels are 
assigned as follows: DP/4 is assigned to ch 1, EPS- 16 plus is assigned to ch 2, and DMP-1 1 is 
assigned to ch 3 and ch 4. The second and third bytes are the numbers of the functions that 
should be controlled, and the fourth and fifth bytes are the parameters of the function. Their 
values are determined on the MIDI equipment side, and they are adopted. 

[0055] 

In said MIDI data generating piart, the various parameters for positioning the sound image 
computed with the sound image position computing part are converted into MIDI signals in said 
data format. The MIDI data formed in said MIDI data generating part are output from the 
RS-232C port of the spare station. In the current version, a handshake or another error inhibiting 
procedure is not used. Consequently, the measure for the communication error is adopted in the 
MIDI signal relay part. In the present invention, in addition to the sound interface, the direction 
and sensation of distance of the sound source are judged. At the same time, as an auxiliary 
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interface that represents the sensation, several interfaces using images are adopted. 
Consequently, an example will be explained. This image interface is realized on the X-window 
by means of the X-view. Figure 4 is an example of the control window. 

[0056] 

This control window is the unique window present on the initial picture when the system 
of the present invention is started. The mouse operation by the user is mainly performed in this 
window, and the mouse interface portion also belongs to this control windovy. 

[0057] 

Buttons (a-c) in this window open windows onto the various images; button (d) ends the 
system; and canvas (e) is used to sense the movement of the mouse. 

[0058] 

By moving the mouse in said canvas (e), the user can enter the sound space in this 
direction. Also, by means of the left/right buttons on the mouse, the user can change the direction 
stepwise in steps of 360/16 = 22.5*". That is, as shown in Figure 5, the user can move in the 
sound space by selecting any of the 1 6 directions. 

[0059] 

In the mouse interface part, the position of the user is changed by movement, and the 
information is transmitted to the window and audio control part. Also, in the mouse interface 
part, control of the position of the mouse is performed in the control window. After the mouse 
enters canvas (e) (Figure 4), as a result of movement in the sound field space, the mouse comes 
out of the canvas, and the position of the mouse is controlled to return to the center of the canvas. 
Of course, even when the mouse returns, there is still no change in the position of the user. That 
is, the user can get to anywhere in the sound field space regardless of the position of the mouse 
in the control window. As a result, the user even can close his eyes while moving by relying 
solely on the sound, or the user can open his eyes to watch another window while moving. 

[0060] 

In this application example, for the radar window, the source is a window that displays 
the figure (Figure 6) as seen from above the virtual space, and the object is not displayed. 
Because the relative position between the source and the user is judged, the mouse can be easily 
moved while watching the source; 
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[0061] 

As shown in Figure 6, the user is displayed as a white spot on the radar screen, and no 
distinction is made according to the type of source. Also, the direction corresponding to the front 
side of the user is the upper direction of the vertical line (upper side on the screen), and there is 
no change even when the button on the mouse is clicked. For example, when the user clicks the 
left button of the mouse and makes left rotation, the radar window has the source rotated to the 
right around the user at the center, and the user is always kept facing forward (upper side of the 
radar screen). On the radar window, a slider bar (Figure 6b) called "sensitivity" is attached, and it 
is used to change the radar sensitivity. By changing this sensitivity, the range shown on the radar 
is changed, and its value is displayed on the side of the slider bar and on the upper side of the 
radar screen. 

[0062] 

A large value of sensitivity corresponds to a wider range shown on the radar, and a small 
value of sensitivity corresponds to a narrower range. The sensitivity can be changed as desired. 
Consequently, when it is not near the source, the sensitivity becomes higher, and, as it 
approaches the desired source, the sensitivity becomes smaller, and the position of the user is 
finely adjusted. Also, the dashed pattern of the horizontal line and vertical line of the radar 
screen can be changed according to the sensitivity value such that the visual sensitivity can be 
judged. 

[0063] 

In this application example, a bird's-eye view window (Figure 7) has the relationship of 
inner/outer with the radar window. Without displaying the source, only the object is displayed, 
and it shows the picture as seen from above the scene of the periphery of the user. 

[0064] 

In this window, the user moves towards the source at the site of the source (such as in the 
forest for cacadas). 

[0065] 

In this window, the user is positioned at the center of the screen, as in the radar window, 
and the upper direction of the screen is in agreement with the front side of the user. The 
bird's-eye view window becomes the field of view of the user corresponding to the 5 objects on 
the periphery as well as back-and-forth and left/right sides of the user from among the map 
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corresponding to, e.g., the 50 x 50 map data. Also, scrolling can be performed by the user 
moving the mouse, and rotation can be performed by clicking the mouse. 

[0066] 

In the embodiment, for example, as shown in Figure 8, 16 maps of 50 x 50 are drawn in 
16 directions on the canvas, and a bird's-eye view window shows some of them. As the user 
clicks the button of the mouse to input the information, it jumps to the map corresponding to the 
direction. As a result, the front side of the user remains unchanged at all times. As a result, 
although preparation of 16 types of maps is unfavorable in terms of memory efficiency, it 
nevertheless can lead to reducing the time for reloading. Also, no snow phenomenon takes place 
for the picture shown in this case. In this method, it is possible to realize real-time operation. The 
bit map using the aforementioned bird's-eye view is formed from the icon editor, as shown in 
Figure 9. In practice, the size of the object as seen by the user becomes 16x 16 = 256 dots. 
Because the bit map is square, if 16 x 16 is used, there exists no space on the map when the 
magnification is other than the magnification of the direction of 90° which the user faces. When 
a 24 X 24 bit map is used, since the objects is used for the superimposed portion are ORed, the 
presence of said space is excluded. 

[0067] 

In this application example, the 3D window (the window of the three-dimensional object) 
(Figure 10) displays only the scene on the periphery of the user, as in the case of the bird's-eye 
view window. Here, a three-dimensional object is set. It is for setting the scene near the line of 
sight of the user so as to improve the on-site sensation. Also, in this window, the background is 
also displayed for direction sensing and recognition of the direction by the user. The background 
that enters the picture of the window is taken as the range corresponding to 90*^ of the angle of 
the field of view of the user, and the object entering this field of view is displayed on the screen. 
Because there are 16 directions that the user can take, 16 backgrounds are prepared, and by 
clicking the mouse once, the background is changed for a quarter of the screen in each step. 
According to the distance to the user, 4-6 types of objects are prepared. All of the objects and the 
backgrounds are prepared by means of the icon editor in the same way as in the case of the 
bird's-eye view window. 

[0068] 

Because said background is set at an infinite distance, it does not change as long as the 
direction of the user is not changed. However, the user can move in this sound field space by 
relying on the background and the displayed object. 
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[0069] 

In this application example, the data display window is the window that is opened only 
when the user approaches the source. That is, the user cannot open the window at will (by 
clicking the button on the control window). In this window, the image information and text 
information of the source are displayed. For example, when the present invention is adopted for 
the insect picture collection, as shown in Figure 1 1, for example, the image of a "bell-ring" insect 
and its data are displayed in the data display window. 

[0070] 

In said application example, as the start program of ISF is executed, the various data files 
are next read to make various settings. 

[0071] 

(1) The source data file is a data file of the position information of the source, the volume 
of the source itself, and the directionality of the sound, 

(2) The map data file is a data file of the positions of objects used in the bird's-eye view 
window and the 3D window. 

(3) The icon data file is a data file of the objects and background icons used in the 
bird's-eye view window and 3D window. 

(4) The image/text data file includes the image information and text information 
displayed on the data display window. 

[0072] 

By means of rewriting said data files, it is possible to perform various applications. 

[0073] 

Effect of the invention 

According to the present invention, by means of an interface having a virtual sound field 
space and a visualizing means corresponding to the plurality of sound sources in the sound field 
space, it is possible to retrieve the sound source determined by manipulating the mouse or other 
device. Consequently, there is no need to input the alphanumeric data that used to be necessary 
for retrieval, and the sound source can be retrieved by means of sounds and images. 
Consequently, even when the data for the desired sound source is not confirmed, it is still 
possible display the correct data by hearing the sound to reach the sound source. 
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[0074] 



Also, by adopting a hierarchical structure for the interface, information in much larger 
quantities can be arranged reasonably, and it is possible to obtain the information determined by 



means of the same retrieval. 

Brief explanation of figures 

Figure 1 is a diagram illustrating the ISF system constitution of the invention. 
Figure 2 is a diagram illustrating the constitution of the hardware system of said ISF 

system. 

Figure 3 is a diagram illustrating the MIDI data format of said ISF system. 
Figure 4 is a diagram illustrating the control window of said system. 
Figure 5 is a diagram illustrating the mouse interface of said system. 
Figure 6 is a diagram illustrating the radar window of said system. 
Figure 7 is a diagram illustrating the bird's-eye view window of said system. 
Figure 8 is a diagram illustrating the bird's-eye view window depicted on the canvas of 
said system. 

Figure 9 is a diagram illustrating the icon for depicting the bird's-eye view of said system. 

Figure 10 is a diagram illustrating the 3D window of said system. 

Figure 1 1 is a diagram illustrating the data display window of said system. 
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